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We have characterized 95% (4,404 nucleotides) of the genome of adeno-associated virus type 5 (AAV5), 
including part of the terminal repeats and the terminal resolution site. Our results show that AAV5 is different 
from all other described AAV serotypes at the nucleotide level and at the amino acid level. The sequence 
homology to AAV2, AAV3B, AAV4, and AAV6 at the nucleotide level is only between 54 and 56%. The positive 
strand contains two large open reading frames (ORFs). The left ORF encodes the nonstructural (Rep) 
proteins, and the right ORF encodes the structural (Cap) proteins. At the amino acid level the identities with 
the capsid proteins of other AAVs range between 51 and 59%, with a high degree of heterogeneity in regions 
which are considered to be on the exterior surface of the viral capsid. The overall identity for the nonstructural 
Rep proteins at the amino acid level is 54.4%. It is lowest at the C-terminal 128 amino acids (10%). There are 
only two instead of the common three putative Zn fingers in the Rep proteins. The Cap protein data suggest 
differences in capsid surfaces and raise the possibility of a host range distinct from those of other parvoviruses. 
This may have important implications for AAV vectors used in gene therapy. 



Adeno-associated viruses (AAVs) are small, nonenveloped 
viruses that encapsidate single-stranded DNA of both polari- 
ties in equal amounts. They belong to the Patvoviridae family 
and are distinct from other members of this family by their 
dependence on helpers for replication (for reviews, see refer- 
ences 7-10, 31, and 37). Six primate AAV serotypes have been 
reported in the literature (2, 5, 42). They are designated types 
1 to 6 (AAV1 to AAV6). With the exception of AAV5, which 
has been isolated from a penile flat condylomatous lesion (5, 
19), all known AAVs were first found as contaminants in lab- 
oratory adenovirus stocks (1, 29, 34, 42). Up to now, the DNAs 
of AAV2, AAV3, AAV4, and AAV6 have been sequenced 
(17, 36, 41, 42, 51). The sequence identities among the differ- 
ent serotypes are high. The identities within the genomes of 
AAV2, AAV3, and AAV6 are 82%, and with AAV4 they still 
range from 75 to 78% (17, 36, 42). For AAV3, two sequences 
(designated AAV3A and 3B) have been published which have 
differences in 16 nucleotides (36, 42). In the group of autono- 
mous parvoviruses, the closest relative appears to be the goose 
parvovirus. At the genomic level and at the level of the capsid 
proteins, homologies with sequenced AAVs of ca. 54% have 
been reported (17, 36, 62), and for the nonstructural proteins 
of AAV3 the identities are ca. 44% (36). 

Two large open reading frames (ORFs) have been identified 
within the AAV genome (7, 8, 17, 36, 37, 41, 42, 51). Experi- 
mental data on translation and transcription have mainly been 
obtained for AAV2, but predictions based on nucleotide se- 
quence analogy could be made for the other AAV serotypes. 
The left ORF of AAV2 encodes the nonstructural Rep pro- 
teins that are transcribed from two separate promoters (p5 and 
pl9, according to their relative map positions). The transcripts 
from both promoters are translated from spliced and unspliced 
mRNAs, resulting in four proteins designated Rep78, Rep68, 
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Rep52, and Rep40. The Rep proteins are regulators of AAV 
transcription; they are involved in multiple steps of AAV rep- 
lication, and they play a role in the production of single- 
stranded progeny genomes and virus assembly (7-10, 13-16, 
27, 30, 33, 37, 38, 43, 55, 56, 60). In addition, Rep proteins are 
required for site-specific integration of AAV DNA into the 
host cell genome (44, 45, 58). Furthermore, they are able to 
modulate transcription from heterologous promoters (6, 9, 21- 
23, 25, 26, 28, 32, 37, 61). The degree of sequence conservation 
for the Rep proteins is high among AAV2, AAV3A, AAV3B, 
AAV4, and AAV6. The Rep78 proteins from these viruses are 
reported to be 89 to 93% identical to each other (17, 36, 42). 
This is thought to mirror their important basic functions in the 
AAV life cycle (7-10, 17, 37). 

The AAV cap gene is located in the right half of the AAV 
genome and codes for the three capsid proteins VP1, VP2, and 
VP3, with VP3 being the smallest but most abundant; VP1 has 
the highest molecular weight but is present in a much smaller 
quantity (as shown for AAV2, AAV3, and AAV5 [7-10, 19]). 
The respective mRNA is translated from the p40 promoter. As 
has been shown for AAV2, the Cap proteins differ from each 
other due to alternative splicing and by the use of an unusual 
start codon (ACG) for VP2 (7-11, 37). In contrast to the Rep 
proteins, the reported degree of sequence conservation among 
the capsid proteins is smaller. This is likely to provide a basis 
for differences in host range and host cell specificities (17, 36, 
37, 42). 

AAVs are interesting for several reasons. First of all, they 
have oncosuppressive properties (3, 22-26, 52, 57; for a review, 
see reference 40), and they are useful as general transduction 
vectors for gene therapeutic approaches in human cells (for 
reviews, see references 31, 37, and 48). Much work has been 
done with vectors derived from AAV2 (31, 37). However, 
vectors derived from other AAV serotypes could provide sev- 
eral additional advantages, including the dependence on dif- 
ferent cell receptors, resulting in transduction into different 
cell types, and the resistance to neutralizing antibodies di- 
rected against AAV2 (17, 35, 36, 42, 53). 

Here we report the partial sequence of AAV5 covering 
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1 CGCGACAGGG GGGAGAGTGC_CACACTCTCA AGCAAGGGGG TTTTGTAAGC AGTGATGTCA TAATGATGTA ATGCTTATTG TCACGCGATA GTTAATGATT 

P5 

101 AACAGTCATG TGATGTGTTT TATCCAATAG GAAGAAAGCG CGCGTATGAG TTCTCGCGAG ACTTCCGGGG TAXAAAAGAC CGAGTGAACG AGCCCGCCGC 

StartRop7S/68 

201 CATTCTTTGC TCTGGACTGC TAGAGGACCC TCGCTGCCAT_fiGCTACCTTC TATGAAGTCA TTGTTCGCGT CCCATTTGAC GTGGAGGAAC ATCTGCCTGG 
301 AATTTCTGAC AGCTTTGTGG ACTGGGTAAC TGGTCAAATT TGGGAGCTGC CTCCAGAGTC AGATTTAAAT TTGACTCTGG TTGAACAGCC TCAGTTGACG 
401 GTGGCTGATA GAATTCGCCG CGTGTTCCTG TACGAGTGGA ACAAATTTTC CAAGCAGGAG TCCAAATTCT TTGTGCAGTT TGAAAAGGGA TCTCAATATT 
501 TTCATCTGCA CACGCTTGTG GAGACCTCCG GCATCTCTTC CATGGTCCTC GGCCGCTACG TGAGTCAGAT TCGCGCCCAG CTGGTGAAAG TGGTCTTCCA 
601 GGGAATTGAA CCCCAGATCA ACGACTGGGT CGCCATCACC AAGGTAAACA AGGGCGGAGC CAATAAGGTG GTGGATTCTG GGTATATTCC CGCCTACCTG 

pi£> 

701 CTGCCGAAGG TCCAACCGGA GCTTCAGTGG GCGTGGACAA ACCTGGACGA G TATAAAT TG GCCGCCCTGA ATCTGGAGGA GCGCAAACGG CTCGTCGCGC 

StturtRep52/40 

801 AGTTTCTGGC AGAATCCTCG CAGCGCTCGC AGGAGGCGGC TTCGCAGCGT GAGTTCTCGG CTGACCCGGT CATCAAAAGC AAGACTTCCC AGAAATACfil 

901 fiGCGCTCGTC AACTGGCTCG TGGAGCACGG CATCACTTCC GAGAAGCAGT GGATCCAGGA AAATCAGGAG AGCTACCTCT CCTTCAACTC CACCGGCAAC 

1001 TCTCGGAGCC AGATCAAGGC CGCGCTCGAC AACGCGACCA AAATTATGAG TCTGACAAAA AGCGCGGTGG ACTACCTCGT GGGGAGCTCC GTTCCCGAGG 

1101 ACATTTCAAA AAACAGAATC TGGCAAATTT TTGAGATGAA TGGCTACGAC CCGGCCTACG CGGGATCCAT CCTCTACGGC TGGTGTCAGC GCTCCTTCAA 

1201 CAAGAGGAAC ACCGTCTGGC TCTACGGACC CGCCACGACC GGCAAGACCA ACATCGCGGA GGCCATCGCC CACACTGTGC CCTTTTACGG CTGCGTGAAC 

1301 TGGACCAATG AAAACTTTCC CTTT AATG A C TGTGTGGACA AAATGCTCAT TTGGTGGGAG GAGGGAAAGA TGACCAACAA GGTGGTTGAA TCCGCCAAGG 

1401 CCATCCTGGG GGGCTCAAAG GTGCGGGTCG ATCAGAAATG TAAATCCTCT GTTCAAATTG ATTCTACCCC TGTCATTGTA ACTTCCAATA CAAACATGTG 

1501 TGTGGTGGTG GATGGGAATT CCACGACCTT TGAACACCAG CAGCCGCTGG AGGACCGCAT GTTCAAATTT GAACTGACTA AGCGGCTCCC GCCAGATTTT 

1601 GGCAAGATTA CTAAGCAGGA AGTCAAGGAC TTTTTTGCTT GGGCAAAGGT CAATCAGGTG CCGGTGACTC ACGAGTTTAA AGTTCCCAGG GAATTGGCGG 

p40 

1701 GAACTAAAGG GGCGGAGAAA TCTCTAAAAC GCCCACTGGG TGACGTCACC AATACTAGCT ATAAAAGTCT GGAGAAGCGG GCCAGGCTCT CATTTGTTCC 
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CGAGACGCCT CGCASXTCAG 


ACGTGACTGT 


TGATCCCGCT 


CCTCTGCGAC 


CGCTCAATTG 


GAATTCAAGG 


TATGATTGCA 


AATGTGACTA 


TCATGCTCAA 


1901 


TTTGACAACA TTTCTAACAA 


ATGTGATGAA 


TGTGAATATT TGAATCGGGG CAAAAATGGA 


TGTATCTGTC ACAATGTAAC 
Stop Rep78/52 < 


TCACTGTCAA 
--spl. Stai 


ATTTGTCATG 
tVPl 


2001 


GGATTCCCCC 


CTGGGAAAAG 


GAAAACTTGT 


CAGATTTTGG 


GGATTTTGAC 


GATGCCAATA 


AAGAACAGTA_AATAAAGCGA GTACTCATGT 


CTTTTGTTGA 


2101 


< splice 

TCACCCTCCA OATTGGTTGG 


AAGAAGTTGG 


Stop Rep 68 MO 
TQflAGGTCTT CGCGAGTTTT 


TGGGCCTTGA 


AGCGGGCCCA 


CCGAAACCAA 


AACCCAATCA 


GCAGCATCAA 


2201 


GATCAAGCCC 


GTGGTCTTGT 


GCTGCCTGGT 


TATAACTATC 


TCGGACCCGG AAACGGTCTC 


GATCGAGGAG 


AGCCTGTCAA 


CAGGGCAGAC 


GAGGTCGCGC 


2301 


GAGAGCACGA 


CATCTCGTAC 


AACGAGCAGC 


TTGAGGCGGG 


AGACAACCCC 


TACCTCAAGT 


ACAACCACGC 


GGACGCCGAG 


TTTCAGGAGA 


AGCTCGCCGA 
StartVT2 


2401 


CGACACATCC 


TTCGGGGGAA 


ACCTCGGAAA 


GGCAGTCTTT 


CAGGCCAAGA 


AAAGGGTTCT 


CGAACCTTTT 


GGCCTGGTTG 


AAGAGGGTGC 


TAAGACSGCC 


2501 


CCTACCGGAA 


AGCGGATAGA 


CGACCACTTT 


CCAAAAAGAA 


AGAAGGCTCG 


GACCGAAGAG 


GACTCCAAGC 

StartVP3 
CAATgTCTGC 


CTTCCACCTC 


GTCAGACGCC 


GAAGCTGGAC 


2601 


CCAGCGGATC 


CCAGCAGCTG 


CAAATCCCAG 


CCCAACCAGC 


CTCAAGTTTG GGAGCTGATA 


GGGAGGTGGC 


GGCCCATTGG 


GCGACAATAA 


2701 


CCAAGGTGCC 


GATGGAGTGG 


GCAATGCCTC 


GGGAGATTGG 


CATTGCGATT 


CCACGTGGAT 


GGGGGACAGA 


GTCGTCACCA 


AGTCCACCCG 


AACCTGGGTG 


2801 


CTGCCCAGCT 


ACAACAACCA 


CCAGTACCGA 


GAGATCAAAA 


GCGGCTCCGT 


CGACGGAAGC 


AACGCCAACG 


CCTACTTTGG 


ATACAGCACC 


CCCTGGGGGT 


2901 


ACTTTGACTT 


TAACCGCTTC 


CACAGCCACT 


GGAGCCCCCG 


AGACTGGCAA 


AGACTCATCA 


ACAACTACTG 


GGGCTTCAGA 


CCCCGGTCCC 


TCAGAGTCAA 


3001 


AATCTTCAAC 


ATTCAAGTCA 


AAGAGGTCAC 


GGTGCAGGAC 


TCCACCACCA 


CCATCGCCAA 


CAACCTCACC 


TCCACCGTCC 


AAGTGTTTAC 


GGACGACGAC 


3101 


TACCAGCTGC 


CCTACGTCGT 


CGGCAACGGG 


ACCGAGGGAT 


GCCTGCCGGC 


CTTCCCTCCG 


CAGGTCTTTA 


CGCTGCCGCA 


GTACGGTTAC 


GCGACGCTGA 


3201 


ACCGCGACAA 


CACAGAAAAT 


CCCACCGAGA 


GGAGCAGCTT 


CTTCTGCCTA 


GAGTACTTTC 


CCAGCAAGAT 


GCTGAGAACG 


GGCAACAACT 


TTGAGTTTAC 


3301 


CTACAACTTT 


GAGGAGGTGC 


CCTTCCACTC 


CAGCTTCGCT 


CCCAGTCAGA 


ACCTCTTCAA 


GCTGGCCAAC 


CCGCTGGTGG 


ACCAGTACTT 


GTACCGCTTC 


3401 


GTGAGCACAA 


ATAACACTGG 


CGGAGTCCAG 


TTCAACAAGA 


ACCTGGCCGG 


GAGATACGCC 


AACACCTACA 


AAAACTGGTT 


CCCGGGGCCC 


ATGGGCCGAA 


3501 


CCCAGGGCTG 


GAACCTGGGC 


TCCGGGGTCA 


ACCGCGCCAG 


TGTCAGCGCC 


TTCGCCACGA 


CCAATAGGAT 


GGAGCTCGAG 


GGCGCGAGTT 


ACCAGGTGCC 


3601 


CCCGCAGCCG 


AACGGCATGA 


CCAACAACCT 


CCAGGGCAGC 


AACACCTATG 


CCCTGGAGAA 


CACTATGATC 


TTGAACAGCC 


AGCCGGCGAA 


CCCGGGCACC 



FIG. 1. Partial sequence of AAV5. The 4,404 bases obtained by sequencing of AAV5 are shown. The putative promoters (p5 pi 9 and p40) polyadenylation signal 
(polyA), start and stop codons for nonstructural (Rep) and capsid (VP) proteins, splice sites (arrows), and the trs are underl.ned and marked by boldface letters. The 
position of the inverted terminal sequence counterpart of the terminal resolution site is also indicated by "trs." 

about 95% of the AAV5 genome (4,404 nucleotides [nt]), with mology is reduced to values of between 54 and 56% when 
the exception of the terminal hairpin structures but including AAV5 is included into alignments. The differences concern 
the terminal resolution site and its inverted counterpart (50). both ORFs. The overall percentage of identical amino acids in 
The data show that the genomic organization of AAV5 is the structural proteins is less than 45% and, in contrast to the 
similar to that of other AAVs but that the interserotype ho- case with other AAVs, the overall percentage of identical 
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3701 ACCCCCACGT ACCTCGAGGG CAACATGCTC ATCACCAGCG AGAGCGAGAC GCAGCCGGTG AACCGCGTGG CGTACAACGT CGGCGGGCAG ATGGCCACCA 

3801 ACAACCAGAG CTCCACCACT GCCCCCGCGA CCGGCACGTA CAACCTCCAG GAAATCGTGC CCGGCAGCGT GTGGATGGAG AGGGACGTGT ACCTCCAAGG 

3901 ACCCATCTGG GCCAAGATCC CAGAGACGGG GGCGCACTTT CACCCCTCTC CGGCCATGGG CGGATTCGGA CTCAAACACC CACCGCCCAT GATGCTCATC 

4001 AAGAACACGC CTGTGCCCGG AAATATCACC AGCTTCTCGG ACGTGCCCGT CAGCAGCTTC ATCACCCAGT ACAGCACCGG GCAGGTCACC GTGGAGATGG 

4101 AGTGGGAGCT CAAGAAGGAA AACTCCAAGA GGTGGAACCC AGAGATCCAG TACACAAACA ACTACAACGA CCCCCAGTTT GTGGACTTTG CCCCGGACAG 

Stop VPs polyA 
4201 CACCGGGGAA TACAGAACCA CCAGACCTAT CGGAACCCGA TACCTTACCC GACCCCTTIA_ACCCATTCAT GTCGCATACC CTCAATAAAC CGTGTATTCG 

tr» 

4301 TGTCAGTAAA ATACTGCCTC TTGTGGTCAT TCAATGAATA ACAGCTTACA ACATCTACAA AACCTCCTTG CTTGAGAQTO TQGCACTCTC CCCCCTGTCG 
4401 CGCG 

FIG. 1 — Continued. 



amino acids encoded by the left-sided ORF is also strongly 
reduced from 83.4 to 54.4%. Thus, AAV5 is clearly distinct 
from the other known AAV serotypes. 

MATERIALS AND METHODS 

Cell culture and virus stocks. HeLa cells were cultured as monolayers in 
Dulbecco's modified Eagle medium (DMEM; Sigma, Deisenhofen, Germany), 
supplemented with 5% heat-treated fetal calf serum and glutamine and penicil- 
lin-streptomycin at standard concentrations. These cell cultures were used to 
propagate AAV5 with the helper-adenovirus type 2 (5). Preparations of AAV5 
stocks were essentially done as described previously (5), except for the ammo- 
nium sulfate precipitation, which was replaced by a centrifugation step at 13,000 
rpm for 60 min in the Sorvall SS34 rotor (Sorvall Instruments). 

Preparation or viral DNA and cloning of restriction fragments. Viral DNA 
was isolated from purified virus particles by using alkaline conditions (0.1 N 
NaOH for 45 min at room temperature). The solution was neutralized, and the 
DNA was purified with the Geneclean II kit for removing proteins (Bio 101, Inc., 
Lit Jolla, Calif.). Restriction fragments (BamHU EcoRl, Sad, and Xhol) were 
cloned into bacterial plasmids by standard protocols. pBluescript II(KS+) and 
pUC18 were used as bacterial vectors for cloning in the bacterial strains HB101 
and XLl-Blue. 

DNA sequencing. The major part of the sequence determination was done 
radioactively on piasmid clones of AAV5 by using the dideoxy termination 
method (46) with general vector primers and primers derived from the 3' part of 
the newly determined sequences. The terminal sequences and sequences showing 
deviations between different subclones were determined directly on the viral 
DNA by cycle sequencing by using the fluorescent dye terminator method (ABI 
PRISM Big Dye ready reaction terminator cycle sequencing kit) on a model 377 
automatic sequencer (Per kin- Elmer/ Applied Biosystems) according to the man- 
ufacturer's protocol. The partial terminal sequences were confirmed by LION 
Bioscience, Heidelberg, Germany. 

Sequences were aligned with NCBI's MACAW program (Multiple Alignment 



Construction and Analysis Workbench). Either blocks of similarity or identities 
of the aligned sequences were shaded. 

Nucleotide sequence accession numbers. The partial AAV5 nucleotide se- 
quence determined in this study is available through EMBL databank under 
accession no. Y 18065. 

RESULTS 

Nucleotide sequence and genomic organization. The partial 
AAV5 genome is presented in Fig. 1 and is also available 
through the EMBL databank. 

The major part of the sequence was determined by radioac- 
tive sequencing of cloned restriction fragments derived from 
viral DNA isolated from purified AAV5 particles. Sequence 
differences in overlapping parts of the subclones were resolved 
by fluorescent cycle sequencing directly on the single-stranded 
viral DNA. Viral DNA containing the termini could neither be 
stably propagated in bacteria nor could the terminal hairpin 
structures be sequenced by cycle sequencing, since the problem 
of polymerase stalling in the palindromic structure could not 
be resolved. 

By analogy to other published AAV sequences the sequence 
presented here (Fig. 1) includes the terminal resolution site 
(trs [50]) and part of the inverted terminal structures which 
end at the 5' and 3' ends at symmetrical positions (Fig. 2). As 
far as the sequence was determined, the 5' and 3' inverted 
repeat stretches are identical except for a G at position 38, 
where a T at position 4365 replaced the expected C (Fig. 2A). 



A 

nt 1- 47 5' ?? PGrGAraGGGGG GAGAGTGCCACACTCTC AAGCAAGGqGGTTTTGTA 3' 

AAV5 nt 4404-4356 3' " —nrnm rTGTrPnrrrTCTCAGGGTGTGAC-AGTTCGTTCCtCCAAAACAT — 5' 



B 

AAV2 5'CGCG-CAGAGAGG-GAGTGGCCA-ACTCCATCA --//-- TGATGGAGT - TGGCCACTC- CCTCTCTG - CGCG 3' 

AAV3B 5 1 TGCG-CATAGAGG-GAGTGGCCA-ACTCCATCA --//-- TGATGGAGT- TGGCCACTC -CCTCTATG- CGCA 3' 

AAV4 5 ' CGCG-CATAGAGG-GAGTGGCCA-ACTCCATCA --//-- TGATGGAGT-TGGCCACAT-TAGCTATG-CGCG 3' 

AAV6 5 ' CGCG-CAGAGAGG-GAGTGGCCA-ACTCCATCA --//-- TGATGGAGT - TGCCCACTC - CCTCTATG - CGCG 3' 

AAV5 5'CGCGACAGGGGGGAGAGT-GCCACACTC— TCA --//-- TGA- -GAGTGTGGC-ACTCTCCCCCCTGTCGCG 3' 

FIG 2 Partial sequence of inverted terminal repeats of AAV5. (A) Part of the inverted terminal repeats comprising the first 47 nt that could be deduced at the 
5' end (nt 1 to 47) and the last 49 nt of the 3' end (nt 4356 to 4404). The lowercase letters "g" and "t" mark the nucleotide pair at positions 38 and 4365 that does 
not match the repeat. The putative trs (50) and the respective inverted sequence are shown in boldface letters. Note that the underlined sequences may fold back in 
AAV5 DNA and form a 7-bp double-stranded loop. (B) Alignment of AAV sequences surrounding the trs and the respective inverted positions in the 5' and 3' ends 
of the DNA molecules. The trs consensus sequence and the respective inverted sequences are shown in boldface letters. Dashes indicate gaps introduced in the 
alignment. Nucleotides included in the alignment were as follows: AAV2, 104 to 133 and 4547 to 4576; AAV3B, 103 to 132 and 4590 to 4619; AAV4, 104 to 133 and 
4635 to 4664; AAV6, 104 to 133 and 4551 to 4580; AAV5 partial sequence, 1 to 30 and 4373 to 4402. 
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FIG. 3. Block alignment of AAV genomic sequences. The complete sequences of AAV2, AAV3B, AAV4, and AAV6 and the partial sequence of AAV5 were 
aligned with the MACAW program. Segment pair overlap with a minimum segment pair score of 60 was used as a search method to define nucleotide blocks of local 
similarity. The aligned blocks were linked into the alignment (solid bars). To determine the start and end positions for the alignment of AAV5, the trs (50) were selected 
by sequence analogy. The scale of alignment is 5,000 nt. Nucleotide positions on the individual genomes are in relation to this scale. The panels show the alignment 
of AAV2, AAV3B, and AAV6 (A); the alignment of AAV2, AAV3B, AAV4, and AAV6 (B); and the alignment of AAV2, AAV3B, AAV4, AAV 6, and AAV5 (C), 
The positions of the conserved genetic elements are marked, including the p5, pl9, and p40 promoters (position of TATA boxes); the mtron splice sites (GT/AG 
[arrows]); the polyadenylation signal (poly A); the translation start (up arrowheads) and stop (down arrowheads) sites for the Rep (Rep 78, 68, 52, and 40) and capstd 
(VP1 to VP3) proteins; and the terminal resolution site and its inverted counterpart (both marked trs). Note that all of the elements in panel A were within blocks 
defined by the program and that in panel B only the start of VP3 was excluded, while in panel C several were no longer in regions that met the criteria required for 
block selection [p5, p40, start Rep 78 and 68, stop Rep 78 and 52, start VP1 and VP3, the splice signals, and the poly(A) site]. Such positions were marked manually. 
Note that the p40 sites in panel C are located in stretches of high dissimilarity and were not aligned. 



The location of the TATA boxes, splice sites, and start and 
stop codons indicate a genomic organization similar to that of 
other sequenced AAVs (11, 17, 36, 41, 42, 51). Promoters are 
found around the expected map position for p5, pl9, and p40, 
and there is only one single polyadenylation site at map posi- 
tion 4284 (AATAAA) of the partial AAVS sequence. Trans- 
lation start and stop codons for all putative AAVS nonstruc- 
tural proteins (Rep) and capsid proteins (VPs), including the 
unusual ACG start codon for VP2 and one common stop 



codon shared by all three known capsid proteins, are at the 
expected locations. 

Although up to now we were not able to determine the 
complete terminal sequences of AAV5, the approximate 
length of the genome could be deduced. On the basis of the 
position of the trs (Fig. 2), about 100 additional nucleotides are 
expected at each end of the DNA molecule. Thus, some 150 to 
200 nt presumably would complete the 4,404 nt of the partial 
sequence. The resulting total of 4,550 to 4,600 nt is in agree- 
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FIG 4 Comparison of the left-sided ORFs of AAV. The left-sided ORFs of AAV were aligned with the MACAW program. Segment pair overlap with a minimum 
segment pair score of 60 was used as a search method to define amino acid (aa) blocks of local similarity. The aligned blocks were linked and identical ammo acid 
residues were shaded. Shown are alignments of AAV2, AAV3A, AAV3B, AAV4, and AAV6 (A) and AAV2, AAV3A, AAV3B, AAV4, AAV6, and AAV5 (B). 



ment with the size 4.5 to 4.6 kb derived from agarose gels and 
the mapping of restriction fragments (5). 

To obtain data on binding sites for transcription factors, we 
made use of the TRANSFAC database (59). As with other 
AAVs (8, 17, 36, 42, 51), several putative binding sequences, 
such as CCAAT, GGGCGG, and GGTGGT boxes were found 
for AAV5. Others, such as the cyclic AMP-responsive element 
(CRE; TGACGTCA [20]), were found in AAV5 but not in 
AAV2 DNA or were located at unusual map positions, e.g., the 
consensus sequence GTGACGT for the transcription factor 
EivF (18, 39). In all AAVs sequenced up to now this sequence 
was found upstream of the p5 region (8, 17, 36, 42, 51), but in 
the case of AAV5 it is at map position 1740 and thus is up- 
stream of the p40 promoter. Other recognition sites, such as 
the sequence targets for YY1 (CGACATTTT or CTCCATT 
TT) near the p5 promoter of AAV2 or the p7 promoter of 
AAV4 (17, 49), were not found in AAV5 DNA. 

Similarities among AAV genomes. To find regions of local 
similarity among the different sequences and to define blocks 
of aligned sequence segments, we made use of the MACAW 
alignment program (National Center for Biotechnology Infor- 
mation). The different multiple alignments are shown in sche- 
matic form in Fig. 3. 

When the sequences of AAV2, AAV3B, and AAV6 are 
compared (score cutoff, 60), almost the entire genomes could 
be well aligned into blocks, with the exception of small regions 
between the terminal repeats and the transcribed part of the 
genomes, some heterogeneity in the splice region, and one 



interval corresponding to AAV2 nt 3548 to 3610 (Fig. 3 A). 
These results are in agreement with the 82% homology re- 
ported by Muramatsu et aL between AAV2 and AAV3A (36) 
and by Rutledge et al. among AAV2, AAV3B, and AAV6 (42). 

AAV4 is more distantly related to the genomes of AAV2, 
AAV3B, and AAV6 (17, 42). Thus, the inclusion of AAV4 in 
the alignment results in the extension and addition of regions 
of low similarity, especially in the right half of the genomes 
(Fig. 3B). 

Since we did not succeed in reading the entire sequence oi 
the AAV5 genome, we used the terminal resolution site and 
the respective inverted sequence (50) (Fig. 2B) as reference 
positions for aligning the partial AAV5 sequence to the pub- 
lished DNA sequences of AAV2, AAV3B, AAV4, and AAV6 
(11, 17, 36, 41, 42, 51). Alignment of all five sequences resulted 
in a further reduction of regions with similarities above the 
cutoff (Fig. 3C). The scattered blocks of sequence similarity 
around the splice region in the left half of the genome (Fig. 3A 
and B) are replaced by a large interval spanning more than 400 
nt (nt 1778 to 2249 on the genome of AAV2 corresponding to 
nt 1684 to 2130 on the incomplete AAV5 genome). This region 
is also covered by a Bam HI restriction fragment (5) used to 
design AAV5-specific primers (54). 

We also used pairwise alignments to find out whether AAV5 
may be more closely related to one or more of the other 
members of the AAV family. But all homologies of the four 
pairs were between 54 and 56%, and the obtained patterns 
(data not shown) were comparable to the picture shown for the 
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FIG. 5. Putative amino acid sequences of the Rep ORFs. The Rep ORFs of AAV2, AAV3A, AAV3B, AAV4, AAV6, and AAV5 were compared by using the 
MACAW alignment program. Identical amino acids are shown with white letters on a black background. Dashes indicate gaps in the sequence added by the alignment 
program. The putative ATP-binding site (ATP) and Zn fingers (three pairs of black dots above the sequences for all AAVs except AAV5, and two pairs of black triangles 
below the sequences for AAV5) are marked. X, start of the two unspliced Rep proteins. 



overall alignment in Fig. 3C. Thus, AAV5 appears to be more 
distant from the rather closely related group of AAV2, AAV3, 
AAV4, and AAV6. This finding is in agreement with the hy- 
bridization data reported in 1984 (5). 

Nonstructural (Rep) protein coding region. By analogy with 
other AAVs, the AAV5 left-sided ORF encodes the putative 
unspliced nonstructural Rep proteins. The complete ORF (nt 
239 to 2068) encodes a protein of 610 amino acids, and the 
additional ATG start codon (nt 899) would give rise to a 
smaller unspliced protein of 390 amino acids (nt 899 to 2068). 
Thus, both putative unspliced AAV5 Rep proteins are slightly 
smaller than those reported for the other AAV serotypes (8, 
17, 36, 37, 42). 

When the aligned left-sided ORFs of AAV2, AAV3A, 
AAV3B, AAV4, and AAV6 were compared at the amino acid 
level, the overall identity was 83.4% (data not shown; see also 
Fig. 4A). However, the percentage of identical amino acid 
residues was reduced to 54.4% when AAV5 was included in 
the alignment (see also Fig. 4B and 5). When Fig. 4A is com- 
pared with Fig. 4B, it becomes evident that the amount of 



identical amino acids was reduced throughout the sequence, 
but it was most striking for the sequences from amino acid 483 
of AAV5 (corresponding to residue 487 of AAV2) to the C- 
terminal amino acid 610 (621 of AAV2). Whereas without 
AAV5 the identity in this part of the aligned sequences could 
still be attributed to 91 of 135 (67.4%) of the amino acids, the 
value was only 13 identical residues of 128 (10.1%) when it was 
included (see Fig. 5). 

The differences between the AAV5 noncapsid proteins and 
the respective proteins of the other AAV serotypes may affect 
the secondary structure and functional domains. While the 
putative ATP-binding site (47) is well conserved in all AAVs, 
one of the three putative Zn fingers at the carboxyl terminus 
(for a review, see reference 10) is missing in the AAV5 Rep 
proteins (Fig. 5). The distances between the three Zn-binding 
motifs (CXXH/CXXC) in AAV2 are 9, 4, and 9 amino acids. 
The respective two motifs in AAV5 have distances of 9 and 4 
amino acids. 

Capsid protein coding region. The right-sided ORF (nt 2087 
to 4258) encodes a capsid protein (VP1) of 724 amino acids, 
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FIG 6 Alignment of the right-sided cap ORFs of AAV. The right-sided ORFs of AAV were aligned with the MACAW program. Segment pair overlap with a 
minimum segment pair score of 60 was used as the search method to define amino acid blocks of local similarity. Identical amino acid residues are shown with white 
letters on a black background. VP1, VP2, and VP3 indicate the beginnings of the respective capsid proteins. 



with a predicted molecular mass of 80 kDa. The VP2 capsid (nt 
2495 to 4258), which by analogy with other AAVs is presum- 
ably initiated at an ACG triplet, and the VP3 capsid (nt 2663 
to 4258) would comprise 588 and 532 amino acids, respectively, 
with corresponding molecular masses of 65 and 59 kDa. Thus, 
all of the molecular masses predicted for the AAV5 capsid 
proteins are smaller than those reported for other AAVs (8, 
17, 36, 37, 42). 

In a group alignment of the amino acid sequences of the 
capsid proteins of AAV2, AAV3, and AAV6, very high iden- 
tities of more than 80% were found (Table 1). The similarity 
was considerably decreased to 59.7% when AAV4 was includ- 
ed in the alignment (Table 1). The pairwise alignments yielded 
identities between 61.5% for AAV2 versus AAV4 and 64.8% 
for AAV3B versus AAV4 (Table 1). Blocks of very high sim- 



ilarity were obtained from amino acids 1 to 173 (AAV2) and 
amino acids 599 to 735 upon both pairwise and overall align- 
ments of the respective sequences. 

The results of pairwise alignments of capsid ORFs of AAV2, 
AAV3A, AAV3B, AAV4, and AAV6 to AAV5 showed iden- 
tities between 51.4% for AAV4 versus AAV5 and 58.8% for 
AAV6 versus AAV5 (Table 1), and the alignment of all six 
capsid ORFs resulted in a reduction of overall amino acid 
identity to 44.9% (Fig. 6; Table 1). This is in correlation to the 
increased heterogeneity seen upon alignment of the respective 
nucleotide sequences (Fig. 3C). 

As can be seen from the results listed in Table 1, including 
either AAV4 or AAV5 in the group alignments strongly re- 
duced the percentage of identities between the capsid ORFs of 
the AAV serotypes 2, 3A, 3B, and 6 (from 80.1 to 59.7 and 



946 BANTEL-SCHAAL ET AL. 



J. Virol. 



TABLE 1. Percentage of identical amino acids in the right-sided 
("cap") ORF derived from AAV alignments" 

% Identical amino acids 



AAV serotype Pairwise 

alignments ; ~ ; Grouo 

5 Aligned to Aligned to uroup 

AAV4 AAV5 



Pairwise 








AAV2 


61.5 


58.3 




AAV3A 


63.9 


58.4 
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64.8 


58.7 




AAV4 


100 


51.4 




AAV6 


64.6 


58.8 




Group 






80.1 


AAV2, AAV3A, AAV3B, and AAV6 






AAV2, AAV3A, AAV3B, and AAV6 






59.7 


versus AAV4 






53.2 


AAV2, AAV3A, AAV3B, and AAV6 






versus AAV5 






44.9 


AAV2, AAV3A, AAV3B, and AAV6 






versus AAV4 and AAV5 









a The right-sided AAV ORFs weTe aligned either pairwise or in the indicated 
combinations with the MACAW program, and the identical amino acids were 
determined. 



53.2%, respectively). However, the pairwise alignment of 
the capsid ORFs of AAV4 and AAV5 yields already a lower 
degree of similarity (51.4; Table 1). Thus, the capsid ORFs 
of both AAV4 and AAV5 differ from the other serotypes 
as well as from each other. Therefore, the percentage of 
identical amino acids decreases further (44.9%) when both 
AAV4 and AAV5 are included in the group alignment (Ta- 
ble 1). 

DISCUSSION 

We have determined the sequence of 4,404 nt (about 95% of 
the estimated size of 4.5 to 4.6 kb) of the AAV5 genome, but 
up to now we were not able to resolve the complete sequence 
of the terminal repeats. Using the trs (50), the respective in- 
verted sequence, and the TATAA box in the p5 promoter as 
reference points, we have made alignments of the AAV5 se- 
quence to the published sequences of AAV2, AAV3B, AAV4, 
and AAV6 (17, 36, 41, 42, 51). The overall organization within 
the AAV5 genome is similar to that of other AAVs with three 
putative promoters, one single polyadenylation site, and two 
large ORFs. 

As reported by others, the identities on the nucleotide level 
between AAV2, AAV3, and AAV6 exceeded 82% and were 
still as high as 75% when AAV4 was included (17, 36, 42). In 
contrast to the good sequence conservation between the other 
AAV serotypes, AAV5 is clearly more distantly related, and 
the overall identity is reduced to about 56% when its nucleo- 
tide sequence is included in the alignment. Thus, the distance 
between AAV5 and the other AAV serotypes is in the same 
range as that reported for the other AAVs compared to their 
closest relatives among autonomous parvoviruses, namely, the 
goose and duck parvoviruses (17, 36, 62). 

The differences in the nucleotide sequence near the p5 and 
the p40 promoters appear to be connected to differences in the 
adenovirus transcription factor recognition sites. Thus, the 
EivF element involved in ElA-responsive transactivation of 
the adenovirus E4 promoter (18, 39) is shifted from the com- 
mon region upstream of the p5 promoter to a position up- 
stream of the p40 promoter in AAV5, and the YY1 sites (49) 



are not found in the AAV5 genome. If and how these consen- 
sus sequences and other sequences (e.g., the CRE element 
[20]) are involved in adenovirus transactivation remains to be 
determined. 

Within the coding regions, stretches of dissimilarity were 
most prominent at AAV5 nt 1684 to 2130 (AAV2 nt 1778 to 
2249), 2504 to 2692 (AAV2 nt 2623 to 2839), and 3393 to 3848 
(AAV2 nt 3529 to 3993). The last two affect all three capsid 
proteins. Thus, the capsid proteins of AAV5 differ significantly 
from the respective proteins of the other serotypes, and the 
overall percentage of identical amino acids is less than 45% 
(Table 1). Since the respective differences comprise regions 
which are supposed to be exposed at the virus surface (12), 
tissue tropism, cellular receptor(s), and resistance towards 
AAV neutralizing antibodies might be different from those of 
other AAVs. Thus, the host range of AAV5 may be distinct. 
This should be of interest for the construction of AAV gene 
transduction vectors and for their application in gene therapy 
(31, 37, 48). 

The region of nucleotide sequence dissimilarity at AAV5 
map positions 1684 to 2130 (AAV2 nt 1778 to 2249) reflects 
differences in the Rep proteins. This certainly represents the 
most striking finding, since the sequence of Rep proteins was 
thought to be highly conserved because of the essential func- 
tions that these proteins have in the AAV life cycle. Whether 
the altered sequence of the AAV5 Rep proteins and the lack 
of a third Zn finger motif leads to structurally and functionally 
different proteins remains to be elucidated. It is noteworthy 
that the alignment of each of the AAV sequences to the re- 
spective sequences of goose parvovirus (4, 62) showed a high 
sequence conservation around the Rep ATP-binding site (47). 
This result (4) is in agreement with the high degree of se- 
quence similarity seen at the respective positions when the Rep 
ORFs of all AAVs are aligned (Fig. 5). Thus, while the C-ter- 
minal sequences and the Zn fingers may be more flexible, the 
high degree of conservation of the ATP-binding site suggests 
its unique structure and points to its central role in the viral life 
cycle. 
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